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ABSTRACT 

We present new constraints on the 21cm Epoch of Rcionization (EoR) power spectrum derived 
from 3 months of observing with a 32-antenna, dual-polarization deployment of the Donald C. Backer 
Precision Array for Probing the Epoch of Reionization (PAPER) in South Africa. In this paper, we 
demonstrate the efficacy of the delay-spectrum approach to avoiding foregrounds, achieving over 8 
orders of magnitude of foreground suppression (in mK^). Combining this approach with a procedure 
for removing off-diagonal covariances arising from instrumental systematics, we achieve a best 2ct upper 
limit of (52mK)^ for k = 0.11 h Mpc^^ at z — 7.7. This limit falls within an order of magnitude of 
the brighter predictions of the expected 21cm EoR signal level. Using the upper limits set by these 
measurements, we generate new constraints on the brightness temperature of 21cm emission in neutral 
regions for various reionization models. We show that for several ionization models, heating of the 
neutral intergalactic medium (IGM) is necessary to remain consistent with the constraints we report. 
Hence, we have suggestive evidence that by z = 7.7, the HI has been warmed from its cold primordial 
state, probably by X-rays from high-mass X-ray binaries or mini-quasars. 



1. INTRODUCTION 

The Donald C. Backer Precision Array for Probing the 
Epoch of Reionization (PAPER; Parsons et al. 2010)", 
a dedicated experiment employing non-tracking, dual- 
polarization dipole antennas in a 100-200-MHz band, 
is one of several radio telescopes aiming to measure 
the power spectrum of highly redshifted 21cm emis- 
sion to inform our understanding of cosmic reioniza- 
tion (Furlanetto et al. 2006; Morales & Wyithe 2010; 
Pritchard & Loeb 2012). Other such telescopes include 
the Giant Metre-wave Radio Telescope (GMRT; Paciga 
et al. 2013)12^ the LOw Frequency ARray (LOFAR; 
Yatawatta et al. 2013)^^, and the Murchison Widefield 
Array (MWA; Bowman et al. 2012; Tingay et al. 2012)^^^. 
PAPER consists of two distinct arrays: a 32-antcnna de- 
ployment at the NRAO site near Green Bank, WV, which 
is used primarily for engineering investigations and field 
testing, and a 64-antenna deployment in the Square Kilo- 
metre Array South Africa (SKA-SA) reserve in the Karoo 
desert near Carnarvon. PAPER South Africa (PSA) is 
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used primarily for science observations, and provides the 
data on which this paper is based. 

Recent advances in our understanding of how smooth- 
spectrum foreground emission can be isolated from the 
21cm EoR signature via delay-spectrum analysis (Par- 
sons et al. 2012b; hereafter P12b), and how maximum- 
redundancy array configurations can substantially im- 
prove the sensitivity of power spectrum measurements 
in the low signal-to-noise regime (Parsons et al. 2012a; 
hereafter PI 2a), have resulted in dramatic improvements 
in paper's prospects for constraining the power spec- 
trum of 21cm reionization. In this paper, we provide a 
first look at the application of delay-spectrum analysis 
to maximum-redundancy PAPER observations, culmi- 
nating in a new upper limit on the power spectrum of 
21cm emission from reionization. 

This paper is structured as follows. In §2 we describe 
the observations used, §3 details the calibration and anal- 
ysis pipeline, in §4 we describe the results of the analysis 
and the constraints we are able to place on reionization, 
and we conclude in §5 with a discussion of the efficacy of 
delay-spectrum analysis, as well as PAPER'S near-term 
prospects for improving upon these results. 

2. OBSERVATIONS 

In this section, we summarize the salient features of 
the observations used in our analysis. For more details 
regarding the PAPER telescope, its drift-scan observing 
mode, and primary beam shape, we refer the reader to 
Parsons et al. (2010), Jacobs et al. (2011), Pober et al. 
(2012), and Stefan et al. (2012). 

Our analysis centers on the XX and YY linear polar- 
ization products of 32 PAPER antennas deployed in the 
maximum-redundancy configuration shown in the left 
panel of Figure 1. All baselines within this core were used 
in the calibration steps described in §3. For this power- 
spectrum analysis, we use only a subset of the baselines: 
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Fig. 1. — Left: The 32-antenna, dual-polarization, maximum-redundancy array configuration used by the PAPER deployment in the Karoo 
Radio Observatory site in South Africa for power-spectrum observations. For power-spectrum analysis, we use the baselines connecting 
antennas between adjacent columns for north-south deflections of zero (0-16, 1-17, 2-18, 3-19, 8-16, 9-17, etc.), one (1-16, 2-17, etc.), and 
minus one (0-17, 1-18, etc.). Note the expanded scale in the vertical axis. Right: The 64-antenna, single-polarization, minimum-redundancy 
array configuration on the same site, whose imaging observations were used to characterize the spectrum of Pictor A in Jacobs et al. (2013). 
The ma.ximum redundancy positions are shown in green for comparison. 



the 28 antenna pairs connecting adjacent north-south 
columns in the strictly east-west direction (0-16, 1-17, 
2-18, 3-19, 8-16, etc.), as well as the 21 slanted just 
north of east (1-16, 2-17, 3-18, 8-17, etc.), and the 21 
just south of east (0-17, 1-18, 2-19, 9-16, etc.). Within 
each of these groups, baselines are all instantaneously re- 
dundant; they sample the same Fourier modes of the sky. 
Taken together, these 28-1-21-1-21=70 baselines represent 
the majority of the total instantaneous power-spectrum 
sensitivity of the 32-antenna, maximum-redundancy ob- 
servations (P12a). 

For each baseline, a 100-MHz band from 100 to 200 
MHz was divided into 2048 frequency channels, inte- 
grated for 10.7 seconds, and recorded. Observations 
spanned a 59-day period from Dec. 7, 2011 to Feb. 
4, 2012, corresponding to JD2455903.0 to JD2455962.2. 
In this period, observations were recorded for 55 days. 
Over this period, we selected for analysis observations in 
the LST range of 1:30 to 6:30, corresponding to a colder 
patch in the synchrotron sky that dominates the system 
temperature. In summary, the dataset on which this 
analysis is based consists of 55 full days observed with 
28-1-21-1-21 east-west oriented baselines of length 30 m, 
covering a window of 5 sidereal hours. 

For the purpose of characterizing the flux density of 
Pictor A to establish a flux scale for the power spectrum 
measurements, we also made use of data observed from 
JD2455747.1 to JD2455748.1 from a 64-antenna single 
(XX) polarization deployment of the PAPER array in a 
minimum redundancy configuration (see Figure 1, right 
panel) more suited for imaging. The details of these 
observations and their calibration are given in Jacobs 
et al. (2013) and Pober et al. (2013). Observation band- 
width, frequency resolution, and integration time match 
the maximum-redundancy observations reported above. 



3. CALIBRATION AND ANALYSIS 

All analysis described in this section were im- 
plemented using the Astronomical Interferometry in 



PYthon (AIPY)i5 software toolkit. This package 
is under revision control^®; our analysis is based 
on the version of AIPY committed under the hash 
tag dff2f4dba731240ced5cc883895a80cf714fcl2e. An 
overview of the analysis in this section is illustrated in 
Figure 2. 

3.1. Pre-processing 

In pre-processing, we use delay/delay-rate (DDR) fil- 
ters (Parsons & Backer 2009) to help identify radio- 
frequency interference (RFI) events, and as part of a data 
compression technique that reduces data volume by over 
a factor of 40. Further details are supplied in Appendix 
A. The result is that the 2048 original frequency chan- 
nels become 203, and the 60 original time samples per 10- 
minute file become 14, for an effective integration time 
of 43s. Our chosen output integration time is approxi- 
mately 40% shorter than dictated by the maximum fringe 
rate of a 300-m baseline; we chose a conservative interval 
for this — our first application of the data compression 
algorithm. These data contain all emission that rotates 
with the sky and falls in the delay range |t| < 1000 ns, 
corresponding to —0.5 < fcy < 0.5 h Mpc~^ at z — 7.7 
(164 MHz), but compressed in data volume by over a 
factor of 40. 

3.2. Phase Calibration 

Phase calibration is achieved using standard interfer- 
ometric self-calibration with the addition of constraints 
derived from the substantial instantaneous redundancy 
of the array (Wieringa 1992; Liu et al. 2010). GPS 
surveying and total-station laser ranging located all an- 
tennas to within ±10 cm from a perfect grid, based on 
agreement between the two surveying methods. These 
antenna coordinates are used to calibrate the electrical 
delay of each antenna signal path on the basis of redun- 
dancy. Because of the high degree of instantaneous re- 
dundancy in this array, we are able to derive per-antenna 

15 http://pypi.python.org/pypi/aipy 
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Fig. 2. — Data flow for the power-spectrum analysis described in §3. Solid black lines indicate data flow. Solid red lines indicate Monte 
Carlo simulations used to assess signal attenuation resulting from the analysis. Dotted linos indicate information flow for calculating errors 
via bootstrapping. 

polarization deployment of the PAPER array in a 
minimum-redundancy configuration more suited for 
imaging. The details of these observations and their cal- 
ibration are given in Jacobs et al. (2013), along with the 
resulting spectrum of Pictor A that is shown to be con- 
sistent with a single power law that is best fit by 



delays for each signal path relative to only two parame- 
ters that could not be solved for on the basis of redun- 
dancy: the relative electrical delay between the antennas 
in fiducial east-west and north-south baselines (0-16 and 
0-1, respectively). We find per-antenna delays averaged 
over each day of observation to agree within ±0.1 ns be- 
tween days. On this basis, we adopt a single per-antenna 
delay solution for the entire 55-day observation. These 
per-antenna delay solutions for all baselines are incorpo- 
rated into a global self-calibration step that derives the 
remaining two unknown phase parameters using the cen- 
tral components of Centaurus A, Fornax A, and Pictor 
A. 

3.3. Flux Calibration 
3.3.1. An Absolute Flux Scale from Pictor A 

The first challenge we encounter in flux calibration is 
establishing an accurate spectrum of a calibration refer- 
ence. Pictor A is chosen as a calibrator source of ne- 
cessity; given the grating lobes of the synthesized beam 
generated by our maximum redundancy array configura- 
tion, sidelobes of Pictor A and Fornax A dominate nearly 
all beam pointings. Of these two sources, Pictor A is the 
least resolved. 

Unfortunately, the spectrum of Pictor A is particularly 
poorly characterized in the 100-200-MHz band where 
PAPER operates. Previous measurements imply that the 
spectrum below 400 MHz deviates substantially from the 
power law (with spectral index a — —0.85) that holds at 
higher frequencies (Perley et al. 1997). There are also 
serious inconsistencies in the measured flux of Pictor A 
below 1 GHz (e.g. Slee 1995, Kuehr et al. 1981, Large 
et al. 1981, Burgess & Hunstead 2006), and a dearth of 
measurements at nearby frequencies in order to establish 
the spectral slope of Pictor A in this band. 

This shortcoming in the literature has recently been 
addressed with observations from a 64-antenna, single- 
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with Si5o = 388 Jy ± 9.4, with a = -0.76 ± 0.01. Error 
ranges indicate 76% confidence intervals. 

3.3.2. Gain Calibration Augmented by Redundancy 

For these maximum-redundancy observations, gain cal- 
ibration is derived on the basis of both redundancy and 
self-calibration. First, we correct for the known depen- 
dence of amplifier gain and cable attenuation on ambient 
temperature using measured temperature data and tem- 
perature coefficients that were characterized in labora- 
tory measurements (Parashare & Bradley 2009) and con- 
firmed in field measurements (Pober et al. 2012). These 
corrections correspond to approximately a 10% variation 
in amplitudes over the course of the observations. This 
is the only time-dependent parameter used in the gain 
calibration. 

Next, redundant self-calibration is used to derive the 
amplitude of each antenna signal path relative to a single 
overall bandpass function for each (XX,YY) polarization. 
This overall bandpass is then calibrated to the flux scale 
established in Jacobs et al. (2013) by phasing to the po- 
sition of Pictor A and averaging over all inter-column 
baselines. Because Pictor A is resolved at the 20% level 
on the longest (300m) East- West PAPER baselines, we 
use per-baseline estimates of resolution weights to down- 
weight and correct for resolution effects. These resolu- 
tion weights are estimated from the 90cm map of Pictor 
A presented in Perley et al. (1997). Averaged over all the 
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Fig. 3. — Calibrated source spectrum for Pictor A measured using XX (cyan) and YY (magenta) polarization data from the power- 
spectrum observations. Solid lines show power-law fits to the data in the range 120-170MHz. These plots characterize the calibrated flux 
scale of the measurements used in the power spectrum analysis. These measurements were calibrated to agree with the Pictor A spectrum 
in §3.3.1 (solid black). The ripples in the measured spectra are the result of beam sidelobes on Fornax A that are substantially higher for 
maximum-redundancy observations than the minimum-redundancy observations (plusses) used in Jacobs et al. (2013). 



baselines in the array, the inclusion of resolution effects 
change the measured flux by less than 5%. As in Jacobs 
et al. (2013), a fringe-rate filter is applied to the result- 
ing beam to suppress sidelobes that introduce variations 
deviating more than ±0.1 mHz from the fringe rate of 
the source in question. 

To derive a source spectrum from a drift-scan source 
profile, we average time samples using weights derived 
from a model of PAPER'S primary beam response as 
an effective approximation of inverse- variance weighting 
(Pober et al. 2012), and average over 55 days of drift- 
scan observations. A 9th-order polynomial is fit to bring 
each measured spectrum in accordance with the unpolar- 
ized model of the Pictor A spectrum described in §3.3.1. 
This polynomial uses a limit number of terms to prevent 
the coupling of beam sidelobes, which are the most likely 
cause of the residual gain variation in Figure 3, into the 
overall gain calibration. While such sidelobes are dif- 
ficult to distinguish from intrinsic frequency structure, 
we err on the side of caution to avoid introducing any 
frequency-dependent gain terms not in the data. The 
resulting Pictor A spectrum, plotted for each polariza- 
tion in Figure 3 along with the reference spectrum de- 
rived from minimum-redundancy PAPER observations 
(Jacobs et al. 2013), characterizes the flux scale used in 
the subsequent power-spectrum analysis. 

Following calibration to a consistent flux scale, XX and 
YY polarizations for each baseline are directly summed 
to form a coarse estimate of the Stokes I parameter. This 
simplistic construction of Stokes I neglects differences 
in the beam responses of each linear polarization, and 
as a result, contains contributions from Stokes Q away 
from beam center (Jelic et al. 2010; Moore et al. 2013). 
Nonetheless, for analysis of a single baseline type, cor- 
recting for direction-dependent gains is highly non-trivial 
at best, and we attempt no such correction here. On the 
basis of the asymmetry of the PAPER beam model un- 
der 90° rotation, we estimate that this construction of 
Stokes I will contain approximately 4% of Stokes Q as 



well (Moore et al. 2013). For a more thorough treat- 
ment of polarization in the context of drift-scanning ar- 
rays with delay-spectrum analysis, we refer the reader to 
Moore et al. (2013). 

3.4. Wideband Delay- Spectrum Foreground Separation 

The next major step in our data reduction pipeline 
is to remove a best estimate of foreground emission aris- 
ing from the galactic synchrotron and extragalactic point 
sources. This is an essential step, as foreground emission 
exceeds the expected level of EoR emission by 9 orders of 
magnitude (Pober et al. 2013), and can conceal low- level 
RFI and crosstalk systematics. Moreover, as described in 
P12b, smooth-spectrum foreground emission can corrupt 
higher-order fc-modes in power-spectrum measurements 
as a result of sidelobes that arise from RFI flagging and 
the finite (windowed) bandwidth used in the line-of-sight 
Fourier transform. Although power-spectrum measure- 
ments that aim to constrain 21cm reionization must be 
limited to ~10 MHz to avoid significant signal evolution 
within the band, foreground emission is generally coher- 
ent over the entire 100-MHz band of PAPER observa- 
tions. 

A delay-domain filter applied over a wide bandwidth 
can achieve a much sharper degree of foreground isolation 
than is possible in any sub-band, and this can be used to 
effectively model and remove a smooth foreground model 
from each sub-band of interest. Following the procedure 
outlined in P12b and implemented in Pober et al. (2013), 
we perform a delay transform of each integration over the 
entire observing band, according to the equation 
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where V{v) is the measured visibility as a function of 
spectral frequency v, r is delay — the Fourier com- 
plement to J/, W{v) is a Blackman-Harris windowing 
function (Harris 1978) used for minimizing sidelobes and 
band-edge effects, Siv) encodes the frequency-dependent 
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Fig. 4. — RMS temperatures of raw (black), frequency-differenced (magenta), and the time-differenced (cyan) visibilities in various stages 
of analysis. The upper triplet of curves correspond to raw visibilities for the set of 28 east-west baselines, averaged in 43-second LST bins 
over 20 days with a channel bandwidth of 490 kHz. The second triplet of curves from the top corresponds to visibilities after the application 
of the wide-band delay filter described in §3.4. The third triplet shows the result of further averaging delay-filtered visibilities over the set 
of 28 baselines. The lowest triplet shows the same baseline-averaged visibilities after the application of the fringe-rate filter described in 
§3.5. The lower cyan curve is significantly lower than the others as a result of the near-orthogonality, in fringe-rate domain, of a high-pass 
time-differencing filter and the low-pass fringe-rate filter; it is not a valid representation of the noise. Scaling the wide-band delay filtered 
curves by the effective integration time and bandwidth, we determine a system temperature of 560 K at 160 MHz. 

moval to remove systematics that were previously un- 
detected beneath the bright, smooth-spectrum emission. 
Crosstalk removal proceeds by subtracting the 1-hr time- 
average of the visibilities for each baseline from each 
integration. This process, described in Parsons et al. 
(2010), distinguishes oscillating fringes associated with 
celestial emission from the static phase bias associated 
with crosstalk. This crosstalk-removal filter essentially 
constitutes a high-pass fringe-rate filter, as described in 
§3.5. The width of the stop-band of the crosstalk fil- 
ter is much narrower than low-pass fringe-rate filters de- 
scribed in that section. Nighttime data are averaged in 
LST over the 55-day observation using 43-second time 
bins matching the integration interval of the data after 
the compression step described in §3.1. 



sample weights that result from RFI flagging, and '^' 
denotes the delay transformation. Following the proce- 
dure outlined in Parsons & Backer (2009), we deconvolve 
the delay-domain kernel that results from the product 
W{iy) ■ S{iy) using a CLEAN algorithm (Hogbom 1974), 
restricting model components to fall inside of 15 ns be- 
yond the horizon limit of each baseline. We chose a 
15 ns standoff from the horizon to suppress the first 
level of supra-horizon emission that results from the in- 
herent width of S{t). The resultant CLEAN model 
is subtracted from V{t) to effect a delay-domain filter 
suppressing smooth-spectrtmi emission. On the 30-m, 
nearly east-west baselines used in this analysis, in a band 
centered at 164 MHz, this corresponds to a cutoff at 
k = ±0.06 h Mpc^^. This delay filter is solely respon- 
sible for -60 dB of suppression (in mK^ power) of the 
foreground signal seen between the unfiltered signal (top 
black curve) and the final residual (lowest black curve) 
shown in Figure 4. 

After the removal of the bulk of the foregrounds, 
the signal remaining for each baseline is visibly noise- 
dominated. We sum the dynamic spectra of the 28 re- 
dundant baselines in columns of four, leaving seven inde- 
pendent time-series of frequency-dependent visibilities, 
each with quadruple the sensitivity of a single base- 
line. At this point, we apply a final pass of RFI exci- 
sion, flagging 3cr outliers, and proceed to crosstalk re- 



3.5. Fringe-Rate Filtering 

In the last step prior to forming power spectra, we ap- 
ply a fringe-rate filter to effect time-domain integration, 
using the effective time interval that a baseline measures 
a single fc-mode to integrate coherently (with noise de- 
creasing as y/i, in units of mK), before measurements at 
different times represent independent modes that must 
be squared before further integration (with noise now 
decreasing as yi, in units of mK^). 

One way of handling this additional integration is via 
gridding in the uw-plane. Each measured visibility in a 
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wide-field interferometer represents the integral over a 
kernel in the uw-plane that reflects the primary beam of 
the elements (Bhatnagar et al. 2008; Morales & Mate- 
jek 2009) and the w component of the baseline (Corn- 
weU et al. 2003). As noted in SuUivan et al. (2012) and 
Morales & Matejek (2009), in order to optimally account 
for the mode- mixing introduced by these kernels, grid- 
ding kernels must be used that correctly distribute each 
measurement among the sampled uw-modes, such that, in 
the ensemble average over many measurements by many 
baselines, each ww-mode becomes an optimally weighted 
estimator of the actual value given the set of measure- 
ments. 

However, this approach has a major shortcoming when 
applied to maximum-redundancy array configurations In 
order to maximize sensitivity, such configurations arc set 
up to deliberately sample identical visibilities that reflect 
the same combinations of modes in the uv plane, with few 
nearby measurements (P12a). As a result, such array 
configurations tend to lack enough measurements of dif- 
ferent combinations of uv modes to permit the ensemble 
average to converge on the true value. Said differently, 
maximum-redundancy array configurations tend to pro- 
duce measurement sets that, when expressed as linear 
combinations of ■uw-modes of interest, are singular. 

Our alternative approach avoids this and many of the 
difficulties outlined in Hazelton et al. (2013) by applying 
a carefully tailored fringe-rate filter to each time series of 
visibility spectra. As outlined in Appendix A in the con- 
text of data compression, we take the Fourier transform 
of the time series in each channel and apply a low-pass 
filter that preserves fringe-rates that geometrically corre- 
spond to sources rotating on the celestial sphere. In order 
to avoid introducing undesirable frequency structure, we 
apply the same filter to each channel even though, max- 
imum fringe-rates are generally frequency-dependent. In 
a future paper, we will explore the idea of employing 
fringe-rate filters that purposely down- weight fringe-rate 
modes on the sky according to the expected signal-to- 
noise ratio in each mode. Such filters would essentially 
correspond to a one-dimensional implementation of the 
inverse primary beam uw-gridding discussed in Morales 
& Matejek (2009), and have many features in common 
with m-mode synthesis described in Shaw et al. (2013). 

Since thermal noise scatters equally into all fringe rate 
bins, applying a filter that passes only fringe rates cor- 
responding the celestial emission has the effect of de- 
noising the data. We apply such a filter to the data, 
choosing the bounds of the filter to match the geometric 
bounds set by a 30-m east-west baseline, according to 
the equation 

lb. 



/max — 



-'cql 



(3) 



where /max is the maximum fringe rate, beq is the base- 
line vector projected parallel to the equatorial plane, c 
is the speed of light, w® is the angular frequency of the 
Earth's rotation, and v is the spectral frequency. The 
passband of this filter corresponds to an effective integra- 
tion time of 780 s. We note that this filtering could have 
been applied during the data compression described in 
§3.1, but was implemented separately to enable the com- 
pression to work uniformly over all baselines in the array 
without additional information about antenna location. 



After applying this filter, we transform the data back 
to time domain in preparation for forming power spectra 
via the delay transform. It should be noted that, in time 
domain, the data are now heavily over-sampled; adjacent 
samples are no longer statistically independent. Hence, 
when averaging power-spectra versus time, noise will not 
beat down according to the strict number in samples, but 
rather, according to the actual number of statistically 
independent samples underlying the time series. 



3.6. Narrow-band Delay- Transformation, 
Cross-Multiplication 



and 



In the final stage of our core analysis, we apply the 
delay transform in Equation 2 to a 20-MHz band cen- 
tered at 164 MHz, again using a Blackmann-Harris win- 
dow that yields an effective (noise-equivalent) bandwidth 
of 10 MHz, centered at redshift z = 7.7. In this pa- 
per, we limit our analysis to a single cosmological bin 
where, out of the entire PAPER observing bandwidth, 
the noise and foreground residual in Figure 4 is lowest. 
As discussed in P12b, the r-modes that result from the 
delay transform have a peaked response in k\\ that al- 
lows each (u,z;,r)-mode to be interpreted as sampling 
the mode k = 2Tr{u/X,v/X,T/Y), with X and Y rep- 
resenting conversion factors from angle and frequency to 
comoving distance, respectively. As such, we use fcy and 
T-modcs interchangeably in the following discussion. 

In contrast to the wide-band delay filtering described 
in §3.4, where a CLEAN deconvolution with a restricted 
range of model components was used to suppress the 
sidelobes of smooth-spectrum emission resulting from un- 
smooth sampling functions, the removal of known covari- 
ances between modes in V{t) introduced by windowing 
and sampling functions must include all r-modes, and 
ideally, should be a linear process with known noise prop- 
erties. Fortunately, in the band chosen for this analysis, 
the sampling function S{i>) in equation 2 is nearly unity 
at all frequencies, thanks to the exquisite RFI environ- 
ment at the Karoo site in South Africa. As a result, when 
we encode the couplings. A, introduced by the window- 
ing function according to the linear equation 

x' = Ax, (4) 

where x are the true delay-domain visibilities, and x' are 
what we measure, we have that A is invertible. Hence, 
we simply apply its inverse to the measured delay modes 
to diagonalize the covariances introduced in the delay 
transform. 

From these delay-transformed visibilities, we construct 
unbiased estimators of the power spectrum, P(k), by 
cross-multiplying delay spectra between the seven base- 
line groups constructed in §3.4, and summing over all 
cross-multiples, according to the equation: 
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which follows from equation 12 of PI 2a, with A being 
the observing wavelength, k-Q is Boltzmann's constant, 



X^Y is a cosmological scalar with units of 2£ 

is the angular area^^, B is the bandwidth, ( 
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As described in detail in Appendix B, the angular area used to 
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Gates the ensemble average over instantaneously redun- 
dant baseline measurements indexed by i,j, and V{T,t) 
is the delay-transformed visibility, expressed in terms of 
delay t and time t. We use t as a subscript on k to 
denote the different modes sampled by a baseline as the 
sky rotates, and r to indicate the dependence of k on the 
delay mode in question. 

Equation 5 represents the diagonal-covariance limit of 
an optimal quadratic estimator (Dillon et al. 2013a; Liu 
& Tegmark 2011), which we review in detail and adapt to 
delay-transformed visibilities in Appendix C. The diag- 
onal limit generates optimal power spectrum estimators 
in the simple case of ideal signal covariance among statis- 
tically independent r-modes. In theory, the translation- 
invariance of the cosmological signal (along with the in- 
terchangeability of t and k) ensures that the signal co- 
variance is diagonal. Over narrow frequency bands such 
as the one used for these measurements, a similar in- 
variance (and thus, diagonality) along the frequency di- 
rection should hold for the noise and foregrounds (Dil- 
lon et al. 2013a). In practice, we encounter substantial 
systematics whose covariance deviates from this ideal. 
In §3.7, we examine how off-diagonal covariances arising 
from systematics can be suppressed in a straightforward 
way and incorporated into Equation 5 to produce power 
spectrum estimators that are to leading order, optimal. 
Because this later analysis follows the same trajectory 
as the analysis presented, we will first bring the current 
analysis to completion before examining how it is modi- 
fied to incorporate off-diagonal covariances. 

For N redundant samples of a k-mode, N{N — l)/2 
cross-multiples are constructed. At this stage, we do not 
use i > j pairings, choosing instead to preserve the imag- 
inary component of the power spectrum estimator, even 
though P2i(k) is real- valued, as a diagnostic. In the final 
power spectrum estimate, we drop the imaginary compo- 
nent, which is equivalent to including the i > j baseline 
pairings. Hence, we ultimately have iV^ — N measure- 
ments of P(k), with noise decreasing proportional to N 
in mK^ units, as would be expected for coherently inte- 
grating redundant samples before squaring them. 

Next, we average over measurements of independent 
k-modes that statistically reflect the same underlying 
power spectrum, to produce our best power-spectrum es- 
timate, 

fc3 



A^i(fc) 



27r2 



P(Mtr) 



|kt,| = fe' 



(6) 



where the three-dimensional symmetry of the power 
spectrum is invoked to average over all independent mea- 
surements of modes in a shell of |k| — fc, with indepen- 
dent measurements indexed here by t. As described in 
§3.5, the number of independent modes that are aver- 
aged (with noise decreasing with number of modes, M , 

as VM in mK^ units; see P12a) is determined by overall 

normalize high-redshift 21cm power speetrum measurements (e.g., 
Q in Equation 5) is proportional to the integral of the squared beam 
power over angular area (Slpp; equation BIO). This contrasts the 
standard beam area (Op; equation Bll) that is used to relate flux 
density to a brightness temperature. Since Equation 5 relates a 
measured visibility in units of brightness temperature to P(k), a 
factor of flp has already been applied to convert Jy to mK. In this 
case, Q indicates the remaining factor of f2pp, which for PAPER 
is 0.31 sr. 



observing window and the number of fringe-rate bins that 
are preserved in the fringe-rate filtering process. Noise 
levels are estimated using bootstrapping, as described in 
§3.8. Since we have not decimated the number of inte- 
grations to the critical sampling rate corresponding to 
the width of the applied fringe-rate filter, M is not the 
number of integrations. However, we are free to aver- 
age the power spectrum estimates for each integration, 
even though nearby samples do not have statistically in- 
dependent noise, understanding that noise will decrease 
according to the number of underlying independent sam- 
ples. 

We also apply this same process for constructing 
power-spectrum estimators to data that include smooth- 
spectrum foreground emission before it was suppressed 
with the wide-band delay filter described in §3.4. While 
these data arc not expected to yield accurate measure- 
ments of P(k) at higher k owing to sidelobes of data flag- 
ging and low-level systematics, as mentioned previously, 
they do yield effective measurements of the bright fore- 
ground emission that falls within the limits of the wide- 
band delay filter (top-right panel. Figure 5). In order 
to present a complete picture of the power spectrum of 
emission on the sky that is valid at all fc-modes, we stitch 
together the final power spectrum using measurements 
derived from foreground data at /c-modes where emis- 
sion is suppressed by the wide-band delay filter, and us- 
ing measurements from the foreground-suppressed data 
elsewhere. 

The result of the analysis up to this point is the power 
spectrum illustrated by the cyan curve in Figure 6, which 
exhibits residual off-diagonal covariance structure that is 
illustrated in the top-right panel of Figure 5. 

3.7. Suppression of Off-Diagonal Covariances Arising 
from Systematics 

The off-diagonal covariances shown in the upper-right 
panel of Figure 5, which are estimated empirically from 
measured data, are indicative of systematics corrupting 
what should nominally be nearly statistically indepen- 
dent estimators of fc-modes of reionization. This diag- 
nosis is reinforced by the substantial variation of the co- 
variance structure between different baselines; we expect 
the sky signal, whether from foregrounds or reionization, 
to be common to all baseline cross-products (as in the 
top-left panel of Figure 5). Hence, we now turn to inves- 
tigating techniques for removing these systematics while 
retaining, to the greatest extent possible, the reioniza- 
tion signal we are after. As we describe briefly here and 
in detail in Appendix C, this can be achieved by using 
the quadratic estimator formalism (Dillon et al. 2013a; 
Liu & Tegmark 2011). 

As derived in Appendix C, the special case of applying 
optimal quadratic estimators to delay-domain visibilities 
can, to leading order, be approximated as a modification 
of the visibility. In brief, knowledge of the covariances 
allows off-diagonal leakages between delay bins to be pre- 
dicted and removed. If we take the output of the off- 
diagonal covariance removal process described in Equa- 
tion C20 a perturbed visibility, V'{t) from Equation C21, 
such that then for this modified visibility. Equation 5 en- 
codes the remainder of the optimal estimator formalism. 

There are, however two subtle issues to address in ap- 
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Fig. 5. — Upper-left: The covariance matrix of delay modes from two representative baselines generated by applying a narrow-band delay 
transform (§3.6) to data without the application of a wide-band delay filter; upper-right: Similar to upper-left, but applied to data after 
the application of a wide-band delay filter (§3.4); lower-left: Similar to upper-right, but after removing per-baseline off-diagonal covariance 
terms that deviate from the average (§3.7); lower-right: Similar to lower-left, but after removing off-diagonal covariance terms common to 
all baselines associated with the central seven delay modes. This final step attenuates the expected level of the rcionization signal in these 
central modes by '~30%, but only attenuates other modes by ~ 5%. 



plying the ofF-diagonal covariance removal process de- 
scribed in Appendix C. The first problem pertains to our 
limited ability to empirically deduce the true covariances 
present in our data. Because we have a limited number 
of independent samples from which to estimate covari- 
ances in the data, we measure residual noise and signal 
covariances that, in an infinite sample set, would not ex- 
ist. If we naively subtract these residual covariances, we 
over-fit the noise and suppress the 21cm signal. 

There is a simple prescription for avoiding this prob- 
lem, given that, as the top-right panel of Figure 5 illus- 
trates, the dominant systematics we see in the covari- 
ance of fc-modes exhibit a baseline-dependent structure. 
Since celestial emission, whether from foregrounds (top- 
left panel. Figure 5) or 21cm rcionization, should have 
identical structure among redundant baselines, we may 
subtract the average covariance among many baseline 
cross-products (the "panels" in each subplot of Figure 5) 
from each individual cross-product to remove the resid- 
ual signal covariance and leave behind only the covari- 
ance of noise and systematics from which off-diagonal 
covariances can be removed — even if they are residuals 
that would not be present in the true covariance matrix 



— without attenuating the desired celestial signal. This 
lossless removal of systematics is another substantial ben- 
efit of the highly redundant configurations presented in 
P12a. 

We apply this technique for each baseline cross- 
product, using all baselines except the two being cross- 
multiplied for estimating the average covariance in or- 
der to avoid, to the greatest extent possible, coupling 
baseline-dependent systematics into each estimate of the 
common-mode covariance. Iterating this process a mod- 
est number of times (two or three), produces an improve- 
ment in the results as initial baseline-specific system- 
atics are first removed from the baseline, allowing an 
improved estimate of the average covariance to be sub- 
tracted, such that the remainder of the baseline-specific 
systematics are removed. Further iteration is not neces- 
sary, as the process rapidly converges to the result shown 
in the bottom- left panel of Figure 5. 

This process, though quite effective, introduces a sec- 
ond issue that must be addressed. In Equation 5, we 
were careful to exclude products between the same base- 
line in order to avoid incurring the noise bias that would 
result. The benefits of avoiding this bias far outweigh 
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Fig. 6. — Power spectra at z = 7.7 derived from the 55-day PAPER observation described in §2. In both panels, solid cyan depicts 
2(T upper limits derived from PAPER observations without the removal of off-diagonal covariance terms, and black indicates the final 
measured power spectrum with 2a confidence intervals. The measurements in the right panel are weighted averages of positive/negative fcii 
contributions. The horizon limit (vertical dashed) illustrates the boundary within which emission has been filtered out in delay space and 
re-inserted after the formation of power spectra. Dashed cyan illustrates the predicted noise power spectrum from Parsons et al. (2012a) 
for a system temperature of 560 K. The yellow triangles indicate 2cr upper limits reported in Paciga et al. (2013) at 2 = 8.6. Magenta 
illustrates a fiducial model at 50% ionization (Lidz et al. 2008). At k Ri 0.11 h Mpc~^, we report an upper limit on A|]^(fc) of 2700 mK^ 
with 2cr confidence. 



the slight improvement in sensitivity that including such 
"auto-products" would produce. Unfortunately, by using 
the average of many baselines to estimate and subtract 
an average covariance from each baseline cross-product, 
and then subtracting off-diagonal terms in the residual, 
we couple the baseline-averaged noise into the data from 
each baseline. The result is a low-level residual noise bias 
approximately equal to the noise in the power-spectrum 
estimate. 

The most straightforward approach we have found for 
eliminating this residual noise bias, other than direct sub- 
traction (which introduces additional complications), is 
to divide baselines into two separate groups of approxi- 
mately equal size. Within each group, we apply the off- 
diagonal covariance removal process, including the sub- 
traction of an average covariance, using data only from 
baselines within that group. Then, to avoid incurring a 
noise bias, we use only cross-products of baselines be- 
tween these two groups to estimate J^(k). By excluding 
intra-group cross-products, this approach sacrifices a fac- 
tor of approximately v2 in sensitivity (in mK^), but as 
before, we find the benefits of avoiding noise bias to out- 
weigh the loss in sensitivity. 

The last step in the process of suppressing off-diagonal 
covariances in our data is, for select modes, to relax the 
constraint of not subtracting a baseline-averaged covari- 
ance for selected modes, even if it results in overfitting 



the noise and the signal suppression that is associated 
with it. Particularly, we note the interior seven /c-modes 
(the five inside of the horizon limit shown in Figure 6, 
as well as the first modes beyond this limit on either 
side) that we measure are more than an order of mag- 
nitude brighter than other modes, and are so corrupted 
by smooth-spectrum foreground emission that, barring a 
heroic effort aimed at modeling and removing these fore- 
grounds, they are unlikely to be useful for constraining 
high-redshift 21cm emission. We find it advantageous to 
remove all off-diagonal covariances associated with these 
modes, even if it means overfitting the noise. The re- 
sult of this process is shown in the bottom-right panel of 
Figure 5. 

Since we have overfit the noise in this final step, it now 
becomes necessary to investigate how we have affected 
the 21cm signal that we aim to measure, since failure to 
account for signal attenuation can lead to erroneous con- 
straints. We use Monte Carlo simulations (Masui et al. 
2013; Paciga et al. 2013; Switzer et al. 2013) to esti- 
mate the expected signal attenuation through the anal- 
ysis chain from §3.6 onward. In these simulations, we 
apply the analysis pipeline used on the data, including 
off-diagonal covariance suppression, to simulated data 
consisting of random realizations of a flat-spectrum sig- 
nal common to all baselines that rotates with the celes- 
tial sphere. When applying the off-diagonal covariance 
removal process, we use the empirically determined co- 
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variances internal to the simulated data to investigate 
the effects of overfitting the noise. On average for these 
data, removing the off-diagonal covariances associated 
with these seven modes results in a ~ 5% reduction in 
signal amplitude (in mK^) for other modes, and a ^ 30% 
reduction for the seven modes in question. We compen- 
sate by dividing by these signal attenuations when re- 
porting the power spectrum limits in Figure 6. To test 
the effect of our analysis on noise statistics, we also apply 
the identical analysis that is applied to the data, includ- 
ing the removal of off-diagonal covariances determined 
empirically from the observed data, to a random signal 
that emulates pure thermal noise, and find that noise 
that is uncorrelated with the data is not attenuated by 
this analysis. 

We note that none of the analysis described above con- 
stitutes formal model subtraction. Because we estimate 
covariances from data, it is possible to over-fit noise and 
subtract signal, but we take steps to avoid this, and 
to compensate for signal attenuation where subtraction 
does happen by estimating signal attenuation via Monte 
Carlo simulations. If the full covariances were known a 
priori, this analysis would be strictly lossless (Tegmark 
1997), and as shown in Appendix C, optimal to leading 
order. 

3.8. Estimating Residual Noise 

One consequence of the process for removing off- 
diagonal covariances described above is that baseline- 
specific covariances, including residual covariances from 
thermal noise, are heavily suppressed. As a result, it 
is not possible to estimate noise residuals simply from 
the variation between baseline cross-products at the con- 
clusion of this process. Instead, we use bootstrapping 
of baseline samples at the input of the off-diagonal co- 
variance removal process to estimate error in the output 
power-spectrum estimate. 

There are, however, a few subtleties in applying boot- 
strap resampling in this context. One complication is 
that, as described in §3.7, in order to avoid incurring 
a noise bias, it is vital that the removal of off-diagonal 
covariances proceed on independent groups of baselines. 
To reflect that fact that our power spectrum estimator is 
unbiased, it is important to restrict the resampling with 
replacement to avoid including data from the same base- 
line into the two independent baseline groups. This is 
done by first randomly assigning baselines uniquely to 
one of the two groups, and then applying sampling with 
replacement only within each of these groups. At each 
iteration of resampling, the assignment of baselines to 
groups is randomized. 

Another issue in applying bootstrapping to this case 
pertains to how a second bias can be mistakenly included 
if we are not careful to keep track of repeated entries of 
the same baseline data within a group. In this case, the 
problem results from averaging used to suppress residual 
signal covariances in the case that such cross-products 
may actually be the product of a baseline's data with 
itself. The noise bias in these auto-products is coupled 
into the average of the covariances, and biases the result 
of the off-diagonal covariance removal in each baseline 
group. In the cross- multiplication of baselines between 
groups, these biases (which are positive) couple into the 
result. 



The solution in this case is to exclude auto-products 
from the determination of the average covariance among 
baseline cross-products. Since only a few indepen- 
dent cross-products are necessary to distinguish baseline- 
specific features from a residual signal covariance, this 
restriction is relatively harmless. However, it does re- 
quire that there be at least two independent baselines in 
each resampled baseline group. By imposing this restric- 
tion, as well as the restriction to not repeat baselines be- 
tween two independent groups, we limit the sample space 
that bootstrapping explores. It is therefore important to 
check that there remain enough data permutations to 
adequately estimate errors. For these restrictions on the 
resampling of seven baselines sums, there remain 3815 
statistically distinct permutations. Since this is a much 
larger number than the number of independent samples 
we have, the limiting factor in estimating errors will be 
the number of samples, not the number of bootstrap re- 
samplings. 

The mean and 2ct errors shown in Figure 6 are derived 
from 100 bootstrap re-samplings. The errors we report 
do not include covariances between fc-modes, which are 
not treated in bootstrapping. However, as implied in 
(Dillon et al. 2013b), if these modes are not added to- 
gether in binning, then the variances derived from boot- 
strapping are sufficient to characterize the errors. 

4. RESULTS AND DISCUSSION 
4.1. Foreground Suppression and Noise Levels 

Figure 7 shows images of a single frequency channel us- 
ing earth-rotation synthesis, including all 496 baselines 
of paper's 32-antenna maximum-redundancy array. As 
a reminder, the power spectra we report in §4.2 is esti- 
mated directly from a restricted set of visibilities. These 
images are for diagnostic purposes. As indicated by the 
marked reduction of emission between the upper-right 
and lower-left panels, the application of the delay filter 
suppresses peak emission by at least 3 orders of magni- 
tude. The further decrease in residual emission with the 
application of the fringe-rate filter between the lower- 
left and lower-right panels indicates that residuals in the 
lower-left panel are noise-dominated, with the lower-right 
panel showing that peak emission from the raw image has 
been suppressed by 3.3 orders of magnitude. 

This result is paralleled in Figure 4, which shows the 
RMS temperature of visibilities as a function of frequency 
in LST bins between 2:00 and 5:00, before and after 
the application of the wide-band delay filter, as well as 
how noise in the filtered data behaves through the var- 
ious integration steps described in §3. As can be seen 
by the ratio of the unfiltered data to the residual, the 
wide-band delay filter is responsible for suppressing fore- 
ground emission in the middle of the band by nearly 
three orders of magnitude (in mK), demonstrating the 
efficacy of this approach for isolating smooth-spectrum 
foregrounds from fc-modes that might be used to detect 
21cm reionization. This filter substantially outperforms 
time-differencing or frequency-differencing the visibilities 
for foreground suppression because of the sinusoidal na- 
ture of the fringes. 

The gap in the lower set of curves of Figure 4 between 
the raw and differenced visibilities indicates the presence 
of an interfering signal that varies relatively smoothly 
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Fig. 7. — Synthesized beam and dirty images made with the same LST-averaged, maximum-redundancy PAPER observations used in 
the power-spectrum analysis, but including all 496 baselines in the array. For imaging, we used every tenth integration (each with an 
integration time of 43s) in a single 500-kHz channel centered at 160 MHz in an LST range of 2:00 to 5:00, phased to 0=3:00, (5=-30:40°. 
Plotted left to right, top to bottom, are the synthesized beam, an unfiltered dirty image, a dirty image with a wide-band delay filter at 15 
ns past the horizon limit, and a dirty image with both the wide-band delay filter and a fringe-rate filter applied. The synthesized beam 
panel has been normalized to show peak beam response at the maximum of the color scale range. 



in time and frequency. Such characteristics are gener- 
ally indicative of foreground emission, and suggest that 
near the limits of the sensitivity of these observations, 
foregrounds are not completely suppressed. Examin- 
ing the set of visibilities in Figure 4 from which fore- 
grounds have been removed via derivatives in time or 
frequency, we find a ratio of approximately 5 between 
the single-baseline and baseline-averaged curves in the 
140-180 MHz frequency range. This ratio is roughly con- 
sistent with the value of 5.3 that would be expected for 
integrating 28 baselines with independent noise. Below 
130 MHz, this ratio decreases substantially, indicating 
the presence of a correlated signal between different base- 
lines. This correlated signal is shown to be suppressed by 
time-differencing and frequency-differencing filters, indi- 
cating that it is likely to be the result of unsuppressed 
foregrounds near the minimum delay threshold imposed 
by the wide-band delay filter. This interpretation is con- 
sistent with the power spectrum reported in Figure 6. 

Scaling the noise-dominated single-baseline curve by 
the square-root of the number of measurements con- 
tributing to each temperature residual, we determine the 
system temperature at 160 MHz to be 560 K. This value 
is in excess of the expected level of sky noise arising from 
galactic synchrotron emission, but represents a substan- 
tial improvement over the system temperature in Parsons 
et al. (2010). 

Comparing the set of visibilities derived from data af- 
ter the application of a fringe-rate filter, which, although 



weighted for a sharp cutoff in fringe-rate domain, effec- 
tively corresponds to a tenfold increase in integration 
time over the 43-second LST bins used in the upper 
plot. The ratio between the curves with and without 
fringe-rate filtering is approximately 3 in the 140-170 
MHz frequency range. This ratio is roughly consistent 
with the value of 3.2 that would be expected for the ten- 
fold increase in integration time that the application of 
the fringe-rate filter represents. The ratio between the 
filtered and unfiltered curves is substantially lower on ei- 
ther end of this band indicating a departure from noise- 
like behavior. As in other cases, the suppression of the 
systematic with frequency-differencing suggests that this 
refiects residual foregrounds that are relatively smooth. 
Nonetheless, the increasing systematics near the edge of 
the band form the basis of one of the arguments for ob- 
serving with a wide instantaneous bandwidth. It also 
provides the motivation for limiting our power spectrum 
analysis in this initial investigation to the portion of the 
spectrum that appears consistent with noise. 

4.2. Measured Power Spectrum 

In Figure 6, we show the spherically averaged 3D power 
spectrum derived from our measurements. In this figure, 
the central five data points in the range —0.06 < k\\ < 

0.06 h Mpc~^ fall within the horizon limit of the delay 
spectrum, and come from data without the application 
of the wide-band delay filter. We also illustrate the data 
before (cyan) and after (black) the application of the off- 
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diagonal covariance removal described in §3.7. Averaging 
over ±fc||, we place a 2g upper limit on A2i(fc) of 2700 

mK^ for k « 0.1 h Mpc^^. This limit is more than an 
order of magnitude more stringent than the previous best 
upper limit, which was recently revised to 61,500 mK^ 
at /c = 0.5 h Mpc"^ at ^ = 8.6 (Paciga et al. 2013). 

As this plot illustrates, the residual foreground signal 
that we detect in the frequency domain (see Figure 4) 
is concentrated at low |A:j[|-modes in delay space, with 
the largest excess in a region just beyond the horizon 
limit described in P12b. In the range —0.22 < fcy < 

0.22 h Mpc^^, bins show evidence of a systematic bias 
that makes them inconsistent with zero with high signif- 
icance. The source of this biasing signal is likely to be 
foreground emission, as predicted in P12b, but instru- 
mental systematics are another possibility. We observe 
a modest asymmetry in the detected signal with respect 
to positive and negative fcii that is presumed to arise 
from foreground emission. The underlying cause is still 
under investigation, but may relate to the fact that mea- 
surements at positive fcy, for eastward-oriented baselines, 
are in closer proximity to emission on the eastern hori- 
zon where bright sources such as Pictor A and Fornax 
A rise during these observations. Another possibility is 
the presence of a polarization leakage from a signal with 
positive rotation measure (Moore et al. 2013). These 
theories will all require substantial follow-up observation 
and careful analysis to validate. 

As a diagnostic, we apply our analysis to random noise 
injected into the analysis before the narrowband delay 
transform (see Figure 2) at the level expected for these 
observations with a system temperature of 560 K, which 
was measured at 160 MHz using frequency-domain vis- 
ibilities after the application of a wideband delay filter 
(Figure 4) . These noise simulations agree with Equation 
27 of Parsons et al. (2012a), which is the basis of the 
dashed cyan curve in Figure 6. We find good agreement 
between this expected noise level, the estimated power 
spectrum, and the errors derived from bootstrapping. 



4.3. Constraints on the Mean 21cm Brightness 
Temperature 

The limits shown in Figure 6 are not yet at the level of 
the brighter predictions favored by numerical and semi- 
numerical simulations (McQuinn et al. 2007; Zahn et al. 
2007; Ihev et al. 2008; Santos et al. 2008; Mesinger et al. 
2011) that include, among other things, the predicted 
effects of X-ray heating on the IGM. Nonetheless, we 
can, for the first time^®, constrain the 21cm brightness 
temperature, (Tb), of the neutral gas at mean density for 
several reionization models and ionization fractions. 

We recall that the brightness temperature contrast of 
the 21cm transition relative to the CMB (neglecting ve- 
locity gradient terms, and assuming low optical depth, 



T21) can be written in the form 
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where Ts is the spin temperature, xhi is neutral fraction, 
and 5 is the density contrast, all of which are spatially 
dependent. The pre- factor is then 



Tq{z) = 23 mK 
= 25 mK 



0.02 



10 



0.15 



l + z 
10 



1/2 



1/2 



(8) 



(see, e.g., Zaldarriaga et al. 2004; Furlanetto et al. 2006), 
where we have used the Planck 2013 parameters (Planck 
Collaboration et al. 2013). 

Most reionization models assume an early period of 
UV-photon production from stars that, while insufficient 
to ionize the IGM, serves to drive the spin tempera- 
ture towards the gas temperature via the Wouthuysen- 
Field effect. This is followed by heating of the neutral 
gas, primarily through X-rays associated with star for- 
mation and black hole accretion. In these scenarios, 
heating drives Ts ^ Tcmb during reionization, so that 
^ = [Ts^ - Tcmb)/Ts « 1. The brightness contrast 
can be significantly larger, however, if T5 < Tcmb- In 
this case, the 21cm line is seen in absorption against the 
CMB. 

The maximum value of this contrast corresponds phys- 
ically to a situation in which the spin temperature 
remains tightly coupled to the gas temperature via 
the Wouthuysen-Field effcct^^ (Wouthuysen 1952; Field 
1958; Hirata 2006), but in which heating of the neutral 
gas by any mechanism (X-rays, Ly-a, or shocks) is negli- 
gible. While a full characterization of this "no-heating" 
scenario would require a detailed simulation, we note 
that a reasonable re-scaling of power spectra for which ^ 
was assumed to be one (as in Lidz et al. 2008) can be ob- 
tained in the following way. Because reionization tends 
to rapidly produce either fully neutral or fully ionized 
regions, xhi can be locally represented as either (for 
an ionized region) or 1 (for a neutral one) . Since we have 
assumed no heating of the neutral medium, the spatial 
dependence of ^ is negligible. This turns the product 
Tq^ into an overall multiplicative factor on the temper- 
ature fluctuations in Equation 7, and thus re-scales the 
dimensionless ionization power spectrum, A|(fc) (calcu- 
lated from analytic or numerical models), as 

^li{k) = T^{z)e{z)^^,{k) ^ {n^AUk), (9) 

We can estimate the maximum (Tb) produced in the 
no-heating scenario by noting that the CMB and 21cm 
spin temperature must stay in equilibrium due to resid- 
ual Compton scattering until about z ^ 150 (Furlanetto 
et al. 2006). Maximum contrast occurs if T5 tracks the 
gas temperature thereafter, so that we can parameterize 



^^ The constraint in Paciga ct al. (2011) on the brightness tem- 
perature of 21cm emission for a single-bubble reionization model 
was later retracted in Paciga et al. (2013). 



^^ Recent measurements (Bouwens et al. 2010; Schenker ct al. 
2012) show that there are certainly enough UV photons for this to 
be reasonable approximation at z ^ 8. 
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Fig. 8. — Constraints at z = 7.7 on the absolute value of the 21cni brightness temperature of the neutral gas (not normalized by ionization 
fraction), |(Ti,)|, for the patchy reionization model described in Equations 9 and 12, as a function of the upper (fcmax) and lower (fcmin) 
bounds on the scale of fluctuations. Color indicates the maximum |(Tj,)| (in mK) consistent with PAPER'S 2cr upper limits on A|j^(fc) (see 
Figure 6) for the listed ionization fractions. Because the power spectrum amplitude for patch reionization is invariant under the interchange 
of ionized and neutral regions, Xi = 0.1 and 0.3 equivalently correspond to Xi = 0.9 and 0.7, respectively. The maximum color scale of 400 
mK indicates the threshold of brightness temperatures that can be excluded a priori on the basis of the maximum contrast between the 
21cm spin temperature and the CMB (see Equation 11). The black dot and black triangle indicate the coordinates of a patch-reionzation 
approximation to the fiducial and high-mass halos models illustrated in Figure 9, respectively. 
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Fig. 9. — Scaled 21cm EoR power spectra predicted for the models described in §4.3 (Lidz et al. 2008), reflecting bounds on the 21cm 
brightness temperature (T;,) at z = 7.7, according to Equation 9. Magenta curves illustrate power spectra scaled according to (T;,) ^ 30 mK, 
as predicted by simulations that include the effects of X-ray heating on the IGM. Cyan curves illustrate power spectra scaled by the maximum 
(T;,) that is under the 2a upper limits we measure (black). These are, left to right, top to bottom, (T;,) = 471, 451, 315, and 375 mK, 
respectively. 
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the maximum value of S, as 

1 + Zdc 



1 



150 



1 + z 



1 



(10) 



where z^ec is the redshift at which the kinetic temper- 
ature of the IGM gas and the CMB drop out of equi- 
hbrium. Combining Equation 10 and Equation 8 leads 
to 



max[To{z)^{z)] « 370 mK 



10 



1/2 



(11) 



To obtain a bound on (Th), we treat ^ as a free param- 
eter ranging from ^max < C ^ 1 that parametrizes the 
relative degree of heating of the IGM. We first note that 
brightness temperature fluctuations in excess of Equa- 
tion 11 are ruled out a priori at a given redshift, as in- 
dicated by the maximum color scale corresponding to 
(Tb) > 400 mK in Figure 8. We then examine three dif- 
ferent models for A|(fc) at various ionization states, and 
determine for each the maximum (T;,) , beyond which the 
predicted value for A|j^(fc) is inconsistent at the 2cr level 
with the measurements shown in Figure 6. 

We examine two models presented in Lidz et al. (2008) 
based on simulations by McQuinn et al. (2007). Al- 
though the ionization power spectra are associated with 
suggested redshifts in that paper, they apply to any red- 
shift for the indicated ionization fraction, Xi. The first 
model that we examine is the fiducial "SI" model. For 
the Xi = 0.5 and Xi = 0.75 ionization power spectra, we 
obtain hmits of (Tf,) < 471 mK and (Tb) < 451 mK, re- 
spectively. These limits are illustrated in the upper two 
panels in Figure 9. 

In the second model that we examine, "S3" , reioniza- 
tion is dominated by emission from massive halos, and 
the 21cm power spectrum peaks at lower fc-modes in the 
final stages of reionization because of the increased spac- 
ing between the dominant sources of ionizing photons. 
For this model, we obtain limits of (Tf,) < 315 mK and 
(Tb) < 375 for Xi = 0.5 and 0.75, respectively. These 
limits are illustrated in the lower two panels in Figure 9. 

The third model that we consider is a toy "patchy" 
reionization model described by Equation 21 in P12a, 
which approximates A2i{k) as being flat between the 
bounds of a minimum (fcmin) and maximum (fcmax) cut- 
off in fc, scaled appropriately so that total power is con- 
served: 



A^(/c) = {xHi - a;^j)/ln(A:,„ax/fcmin), 



(12) 



where xhi = 1 — a;i is the neutral hydrogen fraction. 
Using this toy model, we tune fcmin and fcmax and deter- 
mine a maximum mean temperature, above which Equa- 
tion 9 becomes inconsistent with our measurements, as 
shown in Figure 8. While our measurements are not 
yet able to exclude (Tb) ~ 30 mK predicted for an X- 
ray-heated IGM under the reasonable assumption that 
fcmax/fcmin > 10, we are able rule out (Tb) > 370mK for 
fcmin < 0.1 ft- Mpc^^ for nearly all physically motivated 
values of fcmax in both the Xi = 0.3 and Xi = 0.5 models. 
For Xi = 0.1, this result becomes dependent on the value 
nf k 

^^ "-max- 

4.4. Implications for X-ray Heating 



The constraints shown in Figures 8 and 9 indicate that, 
for several reionization models and ionization fractions, 
our measurements are inconsistent with the 21cm bright- 
ness temperature from neutral regions that would result 
from the gas in the IGM cooling strictly according to adi- 
abatic expansion after recombination, without additional 
heating. With the caveats that 1) the models we use may 
not correspond to the actual ionization power spectrum 
of reionization, and 2) we do not know whether the ion- 
ization fraction at z « 8 is near to the ionization fractions 
we investigate (although models such as Zahn et al. 2007 
and Lidz et al. 2008 do predict Xi — 0.5 near z « 8), 
the implication of our measurements is that the IGM 
may have been heated with respect to adiabatic cooling. 
X-ray heating from inverse Compton scattering from su- 
pernovae, high-mass X-ray binaries, or mini-quasars is 
the most likely culprit, although shock heating, which is 
currently disfavored (McQuinn & O'Leary 2012), has not 
been completely ruled out as a possibility. 

We have used the PAPER results to explore some ba- 
sic models for reionization. Under the simplifying, al- 
though perhaps not unreasonable, assumption that (Tt) 
is roughly factorable from the growth of structure (e.g. 
ionization bubbles and/or cosmic density), we constrain 
(Tb) in Figures 8 and 9 using limits to brightness tem- 
perature fluctuations observed by PAPER. For example, 
assuming a typical minimum fc-scale of ionization fluctu- 
ations of 0.1 h Mpc~^, and a maximum of 10 h Mpc~^, 
then as shown in Figure 8, the PAPER results imply up- 
per limits on | (Tb) \ of 240 and 220 mK for neutral frac- 
tions of 30% and 50%, respectively, at z = 7.7. For cur- 
rently favored reionization models dominated by emis- 
sion from massive halos (McQuinn & O'Leary 2012), PA- 
PER results imply upper limits between 315 and 375 mK. 

In summary, for massive-halo-dominated reionization 
models, and for a range of simple patchy reionization 
models, PAPER data imply that the value of (Tf,) is less 
than 400 mK. The value of 400 mK is important, since 
it reflects the expected (Tb) in the case where the spin 
temperature of the 21cm line is coupled to the gas kinetic 
temperature, but the neutral gas is never heated on large 
scales due to, e.g. X-rays from high-mass X-ray binaries 
or mini-quasars. Physically, this could correspond to a 
situation where the Lya photons from the first galaxies 
are sufficient to couple the spin and kinetic temperatures 
throughout the neutral IGM via the Wouthuysen-Field 
effect, but they do not substantially heat the gas (Chen 
& Miralda-Escude 2004), nor do supernovae, X-ray bi- 
naries, or mini-quasars produce enough X-rays to heat 
the large scale neutral IGM. We point out that while the 
X-ray heating constraints we present require a coupling 
between the spin and kinetic temperatures, we know that 
this coupling must be occurring at z — 7.7. Observed 
star formation rates (SFRs) at z = 8 (Bouwens et al. 
2010; Schenker et al. 2012) exceed by almost an order of 
magnitude the threshold of lO~^M0/yr Mpc^ required 
for UV photons to couple these temperatures (McQuinn 
& O'Leary 2012). 

While this no-heating model is considered physically 
unlikely (Furlanetto et al. 2006), the argument can be 
turned around: the fact that we constrain fluctuations 
at the 400 mK level provides, for the first time, sugges- 
tive evidence for large-scale heating of the neutral IGM 
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during reionization at z > 7. Such constraints on X- 
ray heating may soon have reprocussions on the global 
signal (Pritchard & Loeb 2012) and on the morphology 
of ionization (Mesinger et al. 2013). Note that our mea- 
surements pertain to the thermal evolution of the neutral 
gas during reionization, and not the subsequent thermal 
evolution of the ionized gas after reionization (Hui & 
Haiman 2003). 

5. CONCLUSION 

We have set the most stringent limits to date on 
the 21cm EoR power spectrum. This achievement was 
greatly facilitated by using the delay-spectrum approach 
to achieve a level of foreground isolation of -80 dB out- 
side of the horizon limit. This is the deepest level of 
foreground suppression that has been achieved so far by 
a 21cm reionization experiment, and with a 2ct upper 
limit of A|i(fc) < (52mK)2 at A; = 0.11 h Mpc"\ the 
result is within an order of magnitude (in mK) of pre- 
dictions of the 21cm EoR signal level. These constraints 
arc sufficient to begin ruling out several scenarios where 
X-ray and shock heating fail to heat the IGM prior to 
reionization, although we cannot rule out the possibility 
of a cold IGM with low neutral fraction at z — 7.7. 

Although PAPER has less collecting area than other 
competing 21cm reionization experiments, its ability to 
access lower fc-modes than, e.g., Paciga et al. (2011), as 
a result of its smooth instrumental responses, combined 
with the sensitivity boost that arises from the use of a 
maximum-redundancy array configuration, shows the ef- 
fectiveness of the approach taken by PAPER. The effi- 
cacy of the wide-band delay filter in this analysis shows 
paper's wide operational bandwidth to be a valu- 
able asset for isolating foreground emission in fc-space. 



This work represents an important first validation of the 
delay-spectrum approach to avoiding foregrounds. 

At the limit of the sensitivity of these observations, 
we are beginning to see signs of bias at low fc-modes, 
presumably arising from foregrounds or other systemat- 
ics. Additional investigation will be necessary to fur- 
ther diagnose the source of this bias and constrain its 
source. Using additional data that have been observed 
but not yet analyzed, and a novel filtering technique, we 
expect forthcoming analysis to yield nearly an order-of- 
magnitudc improvement in sensitivity, in units of mK^, 
over the results presented here. Additionally, we also in- 
tend to explore this power-spectrum analysis over a wider 
bandwidth to constraint a broader range of cosmological 
history. Meanwhile, observations are proceeding with 
a 64-antenna, dual-polarization PAPER deployment in 
South Africa, and work is underway to expand PAPER 
in South Africa to 128 antennas by late 2013. Plans are 
also proceeding for a next-generation instrument, the Hy- 
drogen Epoch of Reionization Array (HERA)^°, which, 
with a thousand or more antennas, will have the capa- 
bility to characterize the power spectrum of reionization 
in detail. 
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APPENDIX 
A. A DATA COMPRESSION TECHNIQUE FOR LOW-FREQUENCY TELESCOPES 

Managing the volume of data generated by modern radio telescopes is a technical challenge that is becoming increas- 
ingly problematic. For example, the volume of raw data on which this paper's results are based exceeds 10 Terabytes. 
With data volume scaling quadratically with antenna number, PAPER observations in the near future are likely to 
occupy more than 0.5 Petabytes. This problem is common to many current arrays, and several data-reduction schemes 
(e.g. Ord et al. 2010) have been proposed. These schemes generally place a substantial burden on real-time calibration, 
a feat that, while possible, often proves elusive, particularly for arrays that are under active development. 

In this section, we describe a new method of (lossy) data compression based on delay/delay-rate (DDR) filters 
(Parsons & Backer 2009) that are tailored to the geometric limits of celestial emission in an interferometer's native 
observing coordinates, omitting data outside of those limits. In contrast to data-reduction pipelines that require 
real-time calibration, DDR-bascd compression requires a minimal amount of information at the time of observation, 
provided that instrumental responses are smooth in time and frequency. In the minimal limit, substantial compression 
proceeds based simply on knowing the maximum baseline length in the array. 

To briefly review the principles of DDR filters presented in Parsons & Backer (2009) , the geometric delay with which 
a celestial signal originating in direction s enters an interferometric baseline b is given by 



b • s/c 



(Al) 



where c is the speed of light. Following directly from this definition, the bounds on the geometric delay for this baseline 
are 

Ibl . . Ibl 



<rg< 



(A2) 
c c 

For the maximum baseline length of 300 m, |b|/c is approximately 1/iS, which corresponds to —0.52 < fcii < 

0.52 h Mpc"'^ at z « 8. An example of the bounds of a delay filter in DDR space is given by the shaded cyan 
region in Figure 10. It is worth noting that the range of fcii that are occupied by foregrounds for the 30-m baselines 



http://rcionization.org/ 



16 



Parsons, et al. 



Delay 
Skypass 



N 

X 

■M 




^ *« ill; 



■10,000 -5000 

Delay [ns] 



5000 



10,000 



I 






2.0 

1.8 
1.6 
1.4 
1.2 
1.0 
0.8 
0.6 
0.4 
0.2 
0.0 



Fig. 10. — The delay/delay-rate (D/DR) transform of RFI-flaggcd visibilities from a 30-m PAPER baseline. Raw visibilities are integrated 
for 10.7 s with a channel bandwidth of 50 kHz to facilitate the detection and removal of RFI and other transient events. The regions of 
D/DR-space sampled according to these observing parameters far exceed the delays (cyan) and delay-rates (magenta) that can be occupied 
by sky emission that smoothly varies versus time and frequency, as dictated by a maximum baseline length of 300 m in the array. Color 
scale indicates logjQ(Jy), with arbitrary scaling. 

used in the the power spectrum analysis is substantiaUy narrower than the maximum bounds placed by this data 
compression step. This leaves a substantial region for probing the cosmological 21cm signal that includes the modes 
to which we are most sensitive. 

Taking the Fourier transform of a visibility along the frequency axis (see Equation 2) produces a delay-domain 
expression of the visibility, which we can approximate for a model sky of point sources indexed by n as 



V{T,t) 



Y,M'r,Sn{t))*Sn{T)*6u{Tg 



r), 



(A3) 



where A{t, Sn(t)) indicates the Fourier transform of the antenna response in source direction s„ along the frequency 
axis, Sn indicates the Fourier transform of the source flux density along the frequency axis, Su is a delta function 
relating delay to the geometric delay Tg^„ in the direction of the source indexed by n, and '*' indicates a convolution 
in delay. 

Provided that celestial emission and instrumental responses are smooth versus frequency, so that A and S are 
compact in r, emission in V will be tightly bounded by the geometric constraints on Tg given in Equation A2, and a 

low-pass filter may be applied to V that preserves this range of r-modes. In doing so, such a filter preserves smooth- 
spectrum celestial emission, and removes modes that are expected to consist mostly of noise. Moreover, with the range 
of delay modes now being band-limited, visibilities may be decimated in frequency without aliasing, according to the 
critical Nyquist rate. This decimation is the basis of data compression in the frequency direction. 

Similar geometric limits apply to the variation of visibilities in the time dimension. As described in Equation 7 of 
Parsons & Backer (2009), the rate of change of the geometric delay versus time — that is, delay-rate — is given by 



dTg 
dt 



-iOa 



■ sin H 



cos H I cos S. 



(A4) 



where b = {bx,by,bz) is the baseline vector expressed in equatorial coordinates, w® is the angular frequency of the 
earth's rotation, and H, S are the hour-angle and declination of a point on the celestial sphere, respectively. As a 
result, there exists a maximum rate of change based on the length of a baseline projected to the z = equatorial 
plane. For arrays not deployed near the poles, \by\ ^ \bx\ (i.e., they are oriented more along the east- west direction 
than radially from the polar axis), and the maximum rate of change corresponds to -ff = and S — 0, where we have 



dt c 



c 



(A5) 



For a maximum east-west baseline length in the PAPER array of 300m, ii!^\by\/c is approximately 0.07 ns/s. To better 
elucidate the meaning of this bound, we take the Fourier transform along the time axis (see Equation 8 in Parsons 
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& Backer 2009) for a model visibility consisting of a single point source located at the point of maximum delay-rate, 
which gives us 

V{u, /) « A{v, /) * S{v) * I e^^^*"® ^*e-2"/*dt 

«i(i.,/)*5M*(5D(^^e^-/), (A6) 

where / is the fringe-rate of the sourcc^^, A{h',f) indicates the Fourier transform of the antenna response along the 
time direction, and approximation is indicated because we assume \by\ ^ \bx\, and because the Fourier transform must 
involve a discrete length of time, during which our assumption that cos H Ri 1 breaks down at second order. The delta 
function above gives rise to the expression for the maximum fringe rate in Equation 3. 

This example of a source with a maximal fringe-rate serves to show that a filter may be applied in delay-rate domain, 
using the fact that the maximum delay-rate is bounded by the maximum fringe rate within the band (i.e. evaluating 
Equation 3 at the maximum v involved in the delay transform) , to remove emission that exceeds the variation dictated 
by array geometry for sources locked to the celestial sphere. As in the delay filtering case, assuming the geometric 
bounds on delay rate implicitly assumes that A and S are compact in /, which is to say that instrumental responses 
and celestial emission must be smooth in time; variable emission from, e.g., fast-transients will be heavily suppressed 
by such delay-rate filters. For PAPER, with a maximum baseline length of 300m and a maximum observing frequency 
of 200 MHz, filtering at the maximum delay rate corresponds to an effective integration time of 68.5 s. An example 
of the bounds of a delay filter in DDR space is given by the shaded magenta region in Figure 10. As in the delay 
filtering case, filtering along the delay-rate axis permits substantial down-sampling of the signal, which is the basis for 
the reduction in data volume along the time axis. We note that for the analysis in §3.1, we choose to use a slightly 
wider delay-rate filter to be conservative in our first application of this technique, corresponding to an integration time 
of 43 seconds. 

A serious issue associated with DDR filtering pertains to how data flagging associated with the removal of RFI and 
systematics violates the assumptions of spectral and temporal smoothness that are implicit in the geometric bounds 
we derive for delay and delay-rate. This issue is addressed in Parsons & Backer (2009) by a CLEAN deconvolution for 
removing the sidelobes created by the unsmooth sampling functions associated with data flagging. We apply the same 
technique here for DDR-based data compression. We also take advantage of the substantial suppression of celestial 
emission that results from the removal of low delay and delay-rate modes to improve RFI flagging. Hence, our approach 
to implementing DDR-based data compression proceeds as follows: 

1. Coarse RFI flagging, using time and frequency derivatives to suppress celestial emission. 

2. DDR transformation and CLEAN deconvolution of data from a subset of baselines. This culminates in a CLEAN 
model of emission for each of these baselines, with model components restricted to fall within specifled geometric 
limits. In order to avoid edge effects when filtering along the time axis, we filter data windows that overlap by 
a factor of 3 and retain only the central third of the filtered data. 

3. Refined RFI flagging of residuals after subtracting the above model from each baseline in the subset. 

4. Application of the RFI flagging to all baselines on the basis of occupancy in time/frequency bins. 

5. DDR transformation and CLEAN deconvolution of data from all baselines, based on the refined RFI flagging. 

6. Filtering of DDR-domain data into regions that are saved to three separate output files, as follows: 

(a) Smooth-Spectrum Non- Transient Emission. These data arc filtered at the intersection of the delay and 
delay-rate low-pass filters (cyan and magenta, respectively, in Figure 10) and decimated to the critical 
Nyquist rate along both time and frequency axes. Data are saved to disk using a high dynamic range 32-bit 
floating point data type. Compression factor for PAPER: ^40. 

(b) Smooth-Spectrum Transient Emission. These data are filtered by a low-pass delay filter and a high-pass 
delay-rate filter (cyan in Figure 10, excluding the magenta region) and decimated along the frequency axis 
to the critical Nyquist rate. Since the bulk of celestial emission has been suppressed by not including low- 
order delay-rate modes, these data are written using a lower dynamic range 16-bit shared exponent fioating 
point data type that is adequate for representing a noise-like signal. Compression factor for PAPER: ^20. 

(c) Unsmooth Non- Transient Emission. These data are filtered by a high-pass delay filter combined with a 
low-pass delay-rate filter (magenta in Figure 10, excluding the cyan region) and decimated along the time 
axis to the critical Nyquist rate. These data are also written in the lower dynamic range data type described 
above. Compression factor for PAPER: ^8. 



'^^ Delay rate is equivalent to the frequency-integrated fringe 



rate. 
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Fig. 11. — The dynamic spectrum of RFI-fiagged visibilities from a 30-m PAPER baseline prior to (left) and after (right) the application 
of data compression along the delay and delay-rate axes. Color scale indicates logj^Q(Jy), with arbitrary scaling. 



The result of the appHcation of this procedure for smooth-spectrum non-transient emission to data from one of the 
basehnes used in this analysis is shown in Figure 11. As shown, the spectrally and temporally smooth components 
of the signal are preserved by data compression, while the unsmooth components that originate predominantly from 
thermal noise are removed. This compression algorithm is lossy, but the regions lost in the compression (namely, the 
unshaded corners in Figure 10) are likely to be of minimal import for many low-frequency radio telescopes. 

The computational cost of DDR compression is dominated by the deconvolution of sampling function that is required 
because of RFI flagging. The CLEAN algorithm is nonlinear, but the computation time generally scales as 7V^, where 
N is the number of antennas in an array. A 35-core cluster was sufficient to compress 32-antenna dual-polarization 
PAPER observations roughly four times faster real time. For an upcoming 128-antenna deployment of PAPER, we 
will be deploying a 64-core cluster that is expected to be sufficient to compress data two times faster than real time. 

This technique can be easily generalized to other arrays, as it requires minimal information about the geometry 
of an array. Compression factors could be improved by more carefully tailoring DDR filters to the geometry of each 
baseline individually. We have not explored this on PAPER, as the advantages of a compression system that requires 
minimal information about the array at run-time and is minimally restrictive with respect to further analysis seem to 
outweigh the need for additional data compression at this time. 

B. ON CALCULATING BEAM AREAS FOR NORMALIZING POWER-SPECTRUM MEASUREMENTS 

In this section, we investigate in greater detail the issue of how beam area enters into the power-spectrum formalism 
for 21cm intensity mapping derived in Morales (2005), McQuinn et al. (2006), Pen et al. (2009), and P12a. In 
particular, we show that, for the purpose normalizing power spectrum measurements by the integrated volume, the 
appropriate beam area to use is proportional to the integral of the square of the power response. This fact has been 
obscured in many derivations in the literature by certain approximations that have been made to make volume integrals 
analytically tractable. This derivation represents the general form of the special case for a Gaussian beam that was 
correctly derived in Pen et al. (2009). 

We begin by examining the integrated volume, V, used to normalize the 3D Fourier transform in Equation 3 of P12a. 
We express this volume in observing coordinates as 



= nB 



X^Y, 



(Bl) 



where B is the bandwidth, Q is the angular area, and X, Y are redshift-dependent scalars relating angle and frequency 
to spatial scales, respectively, fl arises from the bounds set by A(l, m, v), the antenna power response, on the angular 
extent in the integral 



V'^{u,v,vi) 



2fcB 

A2 
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dV dm' dzy'v4*(/', to', !/')?'*('', ™',i^')e^"^"''^'""'^'"''-' 



(B2) 



which is a slightly modified version of Equation 6 of P12a relating the delay-transformed visibility V , sampled at 
wavemodes u,v (the Fourier complements of angular coordinates l,m) and "q (the Fourier complement of spectral 
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frequency i^), to a temperature field T. Re-expressing this in real-space cosmological coordinates, we have 
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We may then use the definition of the Fourier transform (following the convention of Equation 1 in P12a) to write 

2 ,,2 , ^3^, 
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2k 

T2 ; (X2y)2 J (27r)3 
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with the multiplicative terms in the previous equation now appearing as convolutions, and A indicating the 3D Fourier 
transform of the antenna power response. Combining the two integrals yields 
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where the second step uses the relationship between an estimate of the power spectrum -P(k') and the two-point 
correlation function of a temperature field given in Equations 2 and 3 in P12a. As discussed in P12a and McQuinn 
et al. (2006), the narrow width of A in fc-space in comparison to the scales on which P(k) varies allows P(k) to be 
removed from the integral, giving us 

2 T,r 



V'^{u,v,r]) 
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Using the coordinate substitution q = k' — k, we have 
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Finally, we invoke Parseval's theorem to substitute the integral of the variance in real-space, picking up a factor of 
1/V in accordance with our chosen Fourier convention, so that we have 
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We compare this result with the relation between the delay-transformed visibility, V , to the three-dimensional power 
spectrum of reionization, P2i(k) (P12a): 



y^iiu.v.vi) 



2k 
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As this shows, the relevant beam area in Equation B9 is the power-square beam, ilpp, given by 

|2 



f2pp = dl dm \A{1, m) 



(B9) 



(BIO) 



This contrasts with the standard metric for beam area — the integrated power beam — which we will call fip , and is 
given by 



Qt 



dl dmA{l, m), 



(Bll) 



This beam area metric is used to convert visibility measurements from Jy units to mK, but is incorrect for normalizing 
power spectra that relate to the two-point correlation function of a temperature field. 
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In the literature to date, sensitivity derivations have commonly assumed a top-hat beam response or a sine response 
in the uw-plane (see, e.g.. Morales 2005, McQuinn et al. 2006, and P12a). Since ftp = flpp under these assumptions, 
this has permitted some ambiguity as to which beam area is intended. For equations that relate power-spectrum 
sensitivity to a system temperature (e.g. Equations 15 and 16 in P12a) 

n' = nl/fipp (B12) 

should be used in lieu of fJ, as these equations pick up two factors of flp in the conversion from Jy^ to mK^, along 
with the a factor of f2pp in the denominator relating to the integrated volume. For equations that relate a measured 
visibility (in units of brightness temperature, e.g. Equation 5) to P(k), the factor of Up is already applied in the 
conversion from units of Jy to mK, and O corresponds to the remaining factor of flpp. 

For PAPER, ftp sa 0.72 sr, while ilpp is 0.31 sr. Following the definition above, ft' sa 1.69. These beam areas are 
calculated numerically from a beam model, but typically, 0' is about a factor of two larger than flp. 

C. OPTIMALITY OF ANALYSIS FOR REMOVING OFF-DIAGONAL COVARIANCES 

In this section, we investigate delay-domain analysis through the lens of the quadratic estimator formalism (Liu & 
Tegmark 2011; Dillon et al. 2013a), ultimately showing that the analysis technique of removing off-diagonal covariances 
arising from systematics produces optimal power spectrum estimators, to leading order. 

C.l. Review of Quadratic Estimator Formalism 

We begin by reviewing the quadratic estimator formalism. In this formalism, the value of the power spectrum pa in 
the ath fc-bin (i.e. P{ko,), also known as the "bandpowcr") can be optimally estimated by the quantity^^ 

Pc.^\Y. M^^xtC-iQ'^C-ix, (CI) 

where the hat signifies that Pa is an estimate of the true bandpower p^, x is a vector containing binned data with 
Xa = V{Ta) from Equation 2, C = (xx^) is the covariance matrix of the data. Map (discussed below) normalizes the 
estimator, and Q^ is the response of the covariance to the /3th bandpower. The family of Q matrices (one for each 
value of (3) map the vector space of bandpowers to the vector space of data, and is constructed to satisfy the relation 

P 

where C'*^ is the contribution of sky emission to the covariance. This contribution includes both the foregrounds and 
cosmological signal. The foregrounds are included with the signal because we have conservatively avoided the direct 
modeling (and subtraction) of foregrounds from the data (Dillon et al. 2013b). The total covariance is given by 

C = N + CJ™'' + C'^''y, (C3) 

where N is the instrumental noise covariance, and CJ""'^ is the contribution of all contaminants except for foregrounds 
and noise. 

Thinking of Map as being an element of a matrix M that lives in a vector space of bandpowers, one is free to choose 
between several different choices of M, all of which are lossless (since M is generally taken to be an invertible matrix, 
and all steps prior to its application are also lossless) but have different error properties. Consider for example the 
window function matrix W, which relates the expectation value of our bandpower estimators to the true bandpowers: 

(p) = Wp, (C4) 

where we have grouped the bandpowers into vectors. The window function matrix can be shown to take the form 

W = MF, (C5) 

where F is the Fisher information matrix, given by 

Fafi^\tr[C~^CrC~^q,^]. (C6) 

Different choices for M give window functions (rows of W) with different properties, and one is subject only the 
restriction that M must be normalized in a way that ensures that each window function integrates to unity to conserve 
power. We pick M = F^^ without loss of generality, a fact that we will explain below (see Dillon ct al. (2013b) for 
other contexts where one might entertain other choices for M, such as taking M to be diagonal). 

^^ Equation CI in principle requires an extra bias removal term u r c- i j i ■ ■ u- /-t-ii 

J . , .■ 1 /T ■ rr^ 1 or,i 1 T^ii i 1 nr^m \ Daselines. Since cross-power spectra do not incur noise biases Dil- 

m order to be optimal (Liu & legmark iOll; Dillon et al. 2013a . , , , om,. n c i li -u i ■ -tu u- i 

„ . ,. ., ^ ■.,,-, ■ ,■ ■ ,■ r n .■ p^ o Ion et al. iOlvib , our nnal result will not require the bias removal 

tor simplicity we omit this term in anticipation ot Section gG.2, , 

where we will generalize our results to cross-power spectra between 
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It is important to note that Equation CI estimates power spectra from single-baseline data. The length of x is 
therefore equal to the number of delay bins, and the covariance denoted here relates different delay bins from the 
same baseline only. The covariance matrix shown in Figure 5, in contrast, includes correlations between delay bins 
of different baselines as well. However, its analysis in §3.7 serves only to provide a better estimate for C by taking 
advantage of redundant baselines. In other words, our final result in §C.2 will yield a procedure for removing off- 
diagonal systematics that operates on a per-baseline basis once C has been estimated, with information from different 
baselines mixing only in the cross-multiplication to form the power spectrum. 

C.2. Application to Delay- Transformed Visibilities 

We now turn to applying the quadratic estimator formalism to delay-transformed visibilities. As derived in P12b, 
T-modes are a good approximation to fcii , and their mutual orthogonality makes the simplified analysis in §3.6 a good 
approximation to optimal analysis. As a result, we only apply the full quadratic formalism to leading order in order 
to safeguard against the mis-modeling of covariances, which are estimated empirically from a finite dataset, and are 
therefore imperfect representations of the true covariances. In this case, the leading-order optimal estimator reduces 
to the simple prescription given in Equation 5, provided that we used modified visibilities (as shown below) in place 

of V(T,i). 

Working in delay space, the Q matrices become particularly simple, since delay modes are to an excellent approxima- 
tion the same as Fourier modes along the line-of-sight (P12b). In this approximation, Cgky is diagonal, with elements 
equal to the bandpowers, i.e. 



-isky 
■"AB 



SabPa 



(C7) 



where A, B index delay modes, and 5ab is the Kronecker delta function. It follows that in this basis, the Q matrices 
simply pick out the correct delay /Fourier component: 



^AB 



VLB /2fc 



X^Y \ X 



SabSap- 
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Like the sky signal, the instrumental noise covariance can be modeled as a diagonal matrix, so that 

Nab = Sab(t% (C9) 

where cta is the RMS noise fluctuation in the ^th delay bin. 

With ideal data, where Cjunk = 0, the preceding discussion implies that C should be diagonal. However, as we can 
see from Figure 5, the actual data covariance contains non-ideal off-diagonal entries. We model the real covariance 
matrix as 
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where r^ = yJpA + <^\ is the total RMS power (sky plus noise) in the Ath. delay bin, and eab is the coupling constant 
between the Ath and Bih bins. Since C is Hermitian, we note that eab = £*ba- Note that so far, our parameterization 
of C is still completely general because we have yet to specify the e parameters; the fact that the off-diagonal entries 
are scaled to the diagonals is simply a convenient convention driven by the observation that our empirically derived 
covariances seem to roughly follow this form. By comparing equations C3, C7, and C9, we see that^^ 
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We will now proceed to compute the various quantities required for our power spectrum estimator, equation CI. 
First, we compute C~^, which will be our primary tool for mitigating the influence of the junk contribution. Instead 
of performing a brute force inversion of C, we can make the observation that it is equal to 



(N + c'^y + cj 



iunkN 



(N + c'^y)-'^ - (N + c^y + cJ""'^)-icJ™''(N + c^y) 



(C12) 



^^ It is important to emphasize that in general, systematie con- 
taminants need not be limited to off-diagonal entries. However, 
diagonal systematics are difScult to distinguish from signal with- 
out other distinguishing features, and so here we set Cjunk to be 
traceless by definition. In effect, any systematics are included as 
additional signal along the diagonals of Csky or N. An examina- 



tion of our result will reveal that this amounts to a conservative 
approach where we leave diagonal systematics untouched in an ef- 
fort to avoid subtracting cosmological signal. 

^"^ This is a special case of the binomial inverse theorem. 
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Taking advantage of the fact that (N + C'^y + CJ""!^ 
into a series: 
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)^^ appears on the right hand side, we can iterate this expression 



=(N + c'^y)-'^ - (N + c"''y)-icJ™''(N + c"''y)-i 



(C13) 



Writing C^^ in this form makes our results more robust to "noise" that arises from our empirical determination of the 
covariance matrix. For example, our series expansion does not require the inversion of Cjunk, which may be unstable 
under inversion thanks to our imperfect estimation of the covariance. One only needs to invert the combination 
N + C*''^'', which is trivial because both N and C*''^^ are diagonal. With a systematic series expansion, we can 
also henceforth work only to leading order in the e parameters. We make this choice based on the intuition that 
uncertainties in our covariance estimation mean that higher order terms cannot be reliably estimated. A leading-order 
approximation may therefore be preferable to using an overly "noisy" covariance to determine C^^, as that would 
result in an overfitting of the noise in the data (Tegmark et al. 1998). 
Taking only the first two terms of equation C13, one obtains 
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Inserting this into equation C6, it is straightforward to show that the effects of Cjunk affect the Fisher matrix only to 
second order in the e parameters and may therefore be neglected: 
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The final ingredient that we require before we can estimate power spectra is the normalization matrix M. As we 
remarked above, we will pick M — F^^. If wc had instead picked some popular alternatives such as taking M to be 
diagonal, it would have made no difference, since we have just shown that the Fisher matrix is diagonal to leading 
order. 

With a diagonal Fisher matrix, we can further simplify our power spectrum estimator. We first define a family of 
matrices 



(j„ = 



2 . 



Ma.pQ" 



so that our estimator can be written compactly as 

p^ = IGV^c-ixp. 
Inserting M = F^^ as well as equations C8 and C15 into our definition of Gq yields 
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Substituting this and equation C14 into equation C17, we obtain 
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This, then, is our leading-order optimal estimator for the bandpower pa in the presence of a "junk" covariance that 
couples different delay bins. Intuitively, our estimator instructs us to subtract from each delay bin a predicted leakage 
from all other bins, which is determined by a coupling constant and the ratio of RMS power levels between the bins^^. 
The resulting quantity is then squared to yield a final power estimate. 

Written in this way, it is also clear how one would generalize the result to cross-power spectra. The optimal estimator 
for cross-power spectra between data from baseline i and baseline j, for example, would be 
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^^ If robust models of diagonal systematics were available, their 
inclusion in Cjunk would simply result in the elimination of the 



^ a restriction on the sum in Equation C19. 
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In summary, if we define 



^ii'Ta) = 2;„ - ^ £a(J—Xfj 



P^a 



rp 



(C2I) 



then Equation C20 reveals that Equation 5 is to leading order an optimal estimator, provided we use V' in lieu of V . 
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